ReplayBuffer

class cpprb.ReplayBuffer(size, env_dict=None, next_of=None, *, stack_compress=None, default_dtype=None, Nstep=None, mmap_prefix=None, **kwargs)

Bases: object

Replay Buffer class to store transitions and to sample them randomly.

The transition can contain anything compatible with numpy data type. User can specify by env_dict parameters at constructor freely.

The possible standard transition contains observation (obs), action (act), reward (rew), the next observation (next_obs), and done (done).

>>> env_dict = {"obs": {"shape": (4,4)},
                "act": {"shape": 3, "dtype": np.int16},
                "rew": {},
                "next_obs": {"shape": (4,4)},
                "done": {}}

In this class, sampling is random sampling and the same transition can be chosen multiple times.

Methods Summary

add(self, **kwargs)

Add transition(s) into replay buffer.

clear(self)

Clear replay buffer.

get_all_transitions(self, bool shuffle)

Get all transitions stored in replay buffer.

get_buffer_size(self)

Get buffer size

get_current_episode_len(self)

Get current episode length

get_next_index(self)

Get the next index to store

get_stored_size(self)

Get stored size

is_Nstep(self)

Get whether use Nstep or not

load_transitions(self, file)

Load transitions from file

on_episode_end(self)

Call on episode end

sample(self, batch_size)

Sample the stored transitions randomly with speciped size

save_transitions(self, file, *[, safe])

Save transitions to file

Methods Documentation

add(self, **kwargs)

Add transition(s) into replay buffer.

Multple sets of transitions can be added simultaneously.

Parameters

**kwargs (array like or float or int) – Transitions to be stored.

Returns

The first index of stored position. If all transitions are stored into NstepBuffer and no transtions are stored into the main buffer, None is returned.

Return type

int or None

Raises

KeyError – If any values defined at constructor are missing.

Warning

All values must be passed by key-value style (keyword arguments). It is user responsibility that all the values have the same step-size.

clear(self)void

Clear replay buffer.

Set index and stored_size to 0.

Example

>>> rb = ReplayBuffer(5,{"done",{}})
>>> rb.add(1)
>>> rb.get_stored_size()
1
>>> rb.get_next_index()
1
>>> rb.clear()
>>> rb.get_stored_size()
0
>>> rb.get_next_index()
0
get_all_transitions(self, bool shuffle: bool = False)

Get all transitions stored in replay buffer.

Parameters

shuffle (bool, optional) – When True, transitions are shuffled. The default value is False.

Returns

transitions – All transitions stored in this replay buffer.

Return type

dict of numpy.ndarray

get_buffer_size(self)size_t

Get buffer size

Returns

buffer size

Return type

size_t

get_current_episode_len(self)size_t

Get current episode length

Returns

episode_len

Return type

size_t

get_next_index(self)size_t

Get the next index to store

Returns

the next index to store

Return type

size_t

get_stored_size(self)size_t

Get stored size

Returns

stored size

Return type

size_t

is_Nstep(self)bool

Get whether use Nstep or not

Returns

use_nstep

Return type

bool

load_transitions(self, file)

Load transitions from file

Parameters

file (str or file-like object) – File to read data

:raises ValueError : When file format is wrong.:

Warning

In order to avoid security vulnerability, you MUST NOT load untrusted file, since this method is based on pickle through joblib.load.

on_episode_end(self)void

Call on episode end

Finalize the current episode by moving remaining Nstep buffer transitions, evacuating overlapped data for memory compression features, and resetting episode length.

Notes

Calling this function at episode end is the user responsibility, since episode exploration can be terminated at certain length even though any done flags from environment is not set.

sample(self, batch_size)

Sample the stored transitions randomly with speciped size

Parameters

batch_size (int) – sampled batch size

Returns

sample – Batch size of sampled transitions, which might contains the same transition multiple times.

Return type

dict of ndarray

save_transitions(self, file, *, safe=True)

Save transitions to file

Parameters
  • file (str or file-like object) – File to write data

  • safe (bool, optional) – If False, we try more aggressive compression which might encounter future incompatibility

__init__()

Initialize ReplayBuffer

Parameters
  • size (int) – buffer size

  • env_dict (dict of dict, optional) – dictionary specifying environments. The keies of env_dict become environment names. The values of env_dict, which are also dict, defines “shape” (default 1) and “dtypes” (fallback to default_dtype)

  • next_of (str or array like of str, optional) – next item of specified environemt variables (eg. next_obs for next) are also sampled without duplicated values

  • stack_compress (str or array like of str, optional) – compress memory of specified stacked values.

  • default_dtype (numpy.dtype, optional) – fallback dtype for not specified in env_dict. default is numpy.single

  • Nstep (dict, optional) – Nstep[“size”] is int specifying step size of Nstep reward. Nstep[“rew”] is str or array like of str specifying Nstep reward to be summed. Nstep[“gamma”] is float specifying discount factor, its default is 0.99. Nstep[“next”] is str or list of str specifying next values to be moved.

  • mmap_prefix (str, optional) – File name prefix to save buffer data using mmap. If None (default), save only on memory.

_encode_sample(self, idx)
_load_transitions_v1(self, data)
add(self, **kwargs)

Add transition(s) into replay buffer.

Multple sets of transitions can be added simultaneously.

Parameters

**kwargs (array like or float or int) – Transitions to be stored.

Returns

The first index of stored position. If all transitions are stored into NstepBuffer and no transtions are stored into the main buffer, None is returned.

Return type

int or None

Raises

KeyError – If any values defined at constructor are missing.

Warning

All values must be passed by key-value style (keyword arguments). It is user responsibility that all the values have the same step-size.

clear(self)void

Clear replay buffer.

Set index and stored_size to 0.

Example

>>> rb = ReplayBuffer(5,{"done",{}})
>>> rb.add(1)
>>> rb.get_stored_size()
1
>>> rb.get_next_index()
1
>>> rb.clear()
>>> rb.get_stored_size()
0
>>> rb.get_next_index()
0
get_all_transitions(self, bool shuffle: bool = False)

Get all transitions stored in replay buffer.

Parameters

shuffle (bool, optional) – When True, transitions are shuffled. The default value is False.

Returns

transitions – All transitions stored in this replay buffer.

Return type

dict of numpy.ndarray

get_buffer_size(self)size_t

Get buffer size

Returns

buffer size

Return type

size_t

get_current_episode_len(self)size_t

Get current episode length

Returns

episode_len

Return type

size_t

get_next_index(self)size_t

Get the next index to store

Returns

the next index to store

Return type

size_t

get_stored_size(self)size_t

Get stored size

Returns

stored size

Return type

size_t

is_Nstep(self)bool

Get whether use Nstep or not

Returns

use_nstep

Return type

bool

load_transitions(self, file)

Load transitions from file

Parameters

file (str or file-like object) – File to read data

:raises ValueError : When file format is wrong.:

Warning

In order to avoid security vulnerability, you MUST NOT load untrusted file, since this method is based on pickle through joblib.load.

on_episode_end(self)void

Call on episode end

Finalize the current episode by moving remaining Nstep buffer transitions, evacuating overlapped data for memory compression features, and resetting episode length.

Notes

Calling this function at episode end is the user responsibility, since episode exploration can be terminated at certain length even though any done flags from environment is not set.

sample(self, batch_size)

Sample the stored transitions randomly with speciped size

Parameters

batch_size (int) – sampled batch size

Returns

sample – Batch size of sampled transitions, which might contains the same transition multiple times.

Return type

dict of ndarray

save_transitions(self, file, *, safe=True)

Save transitions to file

Parameters
  • file (str or file-like object) – File to write data

  • safe (bool, optional) – If False, we try more aggressive compression which might encounter future incompatibility